A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation

نویسندگان

  • Danny Websdale
  • Ben P. Milner
چکیده

This work proposes and compares perceptually motivated loss functions for deep learning based binary mask estimation for speech separation. Previous loss functions have focused on maximising classification accuracy of mask estimation but we now propose loss functions that aim to maximise the hit minus false-alarm (HIT-FA) rate which is known to correlate more closely to speech intelligibility. The baseline loss function is binary cross-entropy (CE), a standard loss function used in binary mask estimation, which maximises classification accuracy. We propose first a loss function that maximises the HIT-FA rate instead of classification accuracy. We then propose a second loss function that is a hybrid between CE and HIT-FA, providing a balance between classification accuracy and HIT-FA rate. Evaluations of the perceptually motivated loss functions with the GRID database show improvements to HIT-FA rate and ESTOI across babble and factory noises. Further tests then explore application of the perceptually motivated loss functions to a larger vocabulary dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asr-driven Binary Mask Estimation for Robust Automatic Speech Recognition

Additive noise has long been an issue for robust automatic speech recognition (ASR) systems. One approach to noise robustness is the removal of noise information through segregation by binary time-frequency masks; each time-frequency unit in a spectro-temporal representation of the speech signal is labeled either noise-dominant or signal-dominant. The noise-dominant units are masked and their e...

متن کامل

معرّفی الگوریتم جدید DESICA برای جداسازی کور سیگنال منابع گفتار در حالت پویا

Abstract: We consider a new scenario in blind speech separation problem in which the number and the features of active sources change with time in opposite to the previous methods in which all sources are active all the time. Accordingly, we propose the new DESICA algorithm for source separation which is a compound of the ICA and DESPRIT algorithms. In this algorithm, using the ICA, the separat...

متن کامل

Dual-Channel Cosine Function Based ITD Estimation for Robust Speech Separation

In speech separation tasks, many separation methods have the limitation that the microphones are closely spaced, which means that these methods are unprevailing for phase wrap-around. In this paper, we present a novel speech separation scheme by using two microphones that does not have this restriction. The technique utilizes the estimation of interaural time difference (ITD) statistics and bin...

متن کامل

On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis

What is the computational goal of auditory scene analysis? This is a key issue to address in the Marrian information-processing framework. It is also an important question for researchers in computational auditory scene analysis (CASA) because it bears directly on how a CASA system should be evaluated. In this chapter I discuss different objectives used in CASA. I suggest as a main CASA goal th...

متن کامل

Distant speech separation using predicted time-frequency masks from spatial features

Speech separation algorithms are faced with a difficult task of producing high degree of separation without containing unwanted artifacts. The time-frequency (T-F) masking technique applies a real-valued (or binary) mask on top of the signal’s spectrum to filter out unwanted components. The practical difficulty lies in the mask estimation. Often, using efficient masks engineered for separation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017